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Abstract The analysis of the causality is important in many fields of research. 1 propose a causal theory 
to obtain the causal effects in a causal loglinear model. It calculates them using the odds ratio and 
Pearl's causal theory. The effects are calculated distinguishing between a simple mediation model 
(model without the multiplicative interaction effect) and a mediation model with the multiplicative 
interaction effect. In both models it is possible also to analyze the cell effect, which is a new interaction 
effect. Then in a causal loglinear model there are three interaction effects: multiplicative interaction 
effect, additive interaction effect and cell effect 


Introduction 

The analysis of the causality is important in many fields of research, for example in economics and in 
social sciences, because the analyst seeks to understand the mechanisms of the analyzed phenomena 
using the relations among the variables (i.e. the relations cause-effect, where some variables are the 
causes, other variables the effects). These variables can influence directly, indirectly or in both ways 
other variables. The set of all effects which influence a variable is called total effect. The direct effect is 
the effect of a variable on another variable without any intervening variables, while the indirect effect 
is the effect of a variable on another variable considering only the effect through the intervention of 
other variables, called mediators. Wright (1921) defines a diagram for the causal relations, which he 
calls "path diagram". In the path diagram, the direct causal relation between 2 variables is represented 
by an arrow which goes from the influencing variable to the influenced variable . If two variables are 
not connected, then there is not direct causal relation between them. The correlation between two 
variables is represented by a double arrow. To explain better the direct, indirect and total effects then 1 
use the path diagram represented in Figure 1: the arrow which goes from X to Y represents the direct 
effect of X on Y, the two arrows which go from X and Z to Z and Y represent the indirect effect of X 
on Y through Z and the arrow which goes from Z to Y represents the direct effect of Z on Y. Then the 
indirect effect is the effect of X on Y mediated by Z. An analyst, then, who is interested in the variable 
Y, will be interested to understand what affects Y and then he will study the direct, indirect and total 
effects. 

It is possible to complicate these effects by introducing the concept of interaction. The interaction 
occurs when the effect of one cause-variable may depend in some way on the presence or absence 
of another cause-variable. In literature the interaction effect can be measured on the additive or 
multiplicative scale and in many case induces that the effect of one variable on another varies by 
levels of a third and vice versa. Figure 2 shows the path diagram of the interaction, where X and Z 
influence directly Y but also their joint effect XZ influences Y. Both interaction effects can be present in 
a model. A problem of using the loglinear models is the inability to calculate all these effects and this 
can be considered its limitation. In this paper I propose a causal theory which provides a method for 
calculating such effects in a loglinear model. 


Causal loglinear model with or without multiplicative interaction 


Before introducing the method to calculate the effects, 1 explain the transition from a loglinear model to 
a causal loglinear model which represents a loglinear model where the variables have a causal role, i.e. 
for example X becomes the cause and Y the effect. Vermunt (1996), indeed, distinguishes the loglinear 
models in these 2 models, which he calls respectively loglinear models and causal loglinear models. 
The loglinear model describes the observed frequencies, it doesn't distinguish between dependent and 
independent variables and it measures the strength of the association among variables. The causal 
loglinear model, introduced by Goodman (1973) and also called "modified path analysis approach", is 
a loglinear model which considers a causal order of the variables a priori. This model, as written by 



Figure 1: Simple mediation model 


Figure 2: Simple interaction model 


Vermunt (2005), consists of specifying a "recursive" system of logit models. In this system the variable, 
which appears as dependent in a particular logit equation, may appear as one of the independent 
variables in one of the next equations. For simplicity, I consider a model with 3 categorical variables, 
X, Z and Y. The joint probability in multiplicative form is 

P{X = x,Z=z,Y = y)= 7 rX=^Z=z,Y=y 

^X=x,Z=z,Y=y 

which can be written also in additive form 

logP(X = x,Z = z,Y = y) = logfj + log^^=* +log}i^^y 

+ log + log _|_ log ^X=x,z=z _|_ Jgg ^Z=z,Y=y 

+iog?i^=^'^=^'^=y 

Now I suppose fhat X, Z and Y are binary ( 0 or 1), and I consider the dummy code, that is: 

= y^=0 = ^Z=0 ^ 1 

^Z=0,Y=0 ^ ^X=0,Y=0 ^ ^Z=0,Y=1 ^ ^^X=0,Y=1 ^ ^Z=1,Y=0 ^ ^jX=l,Y=0 ^ ^ 

„X=0,Z=0 _ „X=0,Z=1 _ ,,X=1,Z=0 _ 1 
f‘ — r — r — ^ 

^X=0,Z=0,Y=i ^ ^^X=0,Z=1,Y=, ^ ^X=1,Z=0,Y=; ^ ^X=1,Z=1,Y=0 ^ 1 i = 0, 1 

The joint probability is shown in table 1. 

Now I consider the model of Figure 1, which gives a priori informations on the causal order. To 
consider the model of Figure 1 in loglinear terms, however, I must suppose that the three-interaction 
term is equal to 1 because, if it is present, it introduces the causal multiplicative interaction term of X 
and Z on Y (Figure 2). The presence or absence of this parameter, indeed, brings about the presence or 
absence of the multiplicative interaction. The multiplicative interaction is measured calculating the 
odds ratios: 


Pr{Y=l\X=l,Z=l] l-Pr(Y=l|X=0,Z=0) 


l-Pr(Y=l|X=l,Z=l) 

Pr(y=l|X=0,Z=0) 

Pr(Y=l|X=0,Z=l) 

Pr(Y=l|X=l,Z=0) 


l-Pr(Y=l|X=0,Z=l) l-Pr(Y=l|X=l,Z=0) 


If this ratio is equal to 1, then there is not multiplicative effect, and this occurs only if ^x=i,z=i,Y=i jg 
equal to 1 or log ^x=i,z=i,Y=i ^g (-q q interaction effect can be interpreted as the interaction 
effect of fhe traditional linear model. Following the probability structure proposed by Goodman (1973), 
the causal model of Figure 1 can be written P(X, Z, Y) = P(Y|Z, X)P(Z|X)P(X): the causal model is, 
then, a decomposition of the joint probability into conditional probabilities. 
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Table 1: The joint probability 
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Table 2: Marginal table XY given Z = 1 


Now 1 consider the relation between the loglmear model and the causal loglinear model. I calculate 
the conditional probabilities using the joint probability and the marginal probabilities. For example 
the conditional probability of Y=1 given X=l, Z=l, i.e. 77 -^=i|^=i'Z=i^ is calculated using table 2 and 
constraining the three-interaction term equal to 1: 


^y=i|x=i,z=i 


^Y=l,X=l|Z=l 

7^y=i,x=i|z=i _|_ 7j-y=o,x=i|z=i 


^Y=l^X=l,Y=l^Z=l,Y=l 

1 + ^Y=1^X=1,Y=1^Z=1,Y=1 


For simplicity, I write this conditional probability as 


^Y|X=l,Z=l^Y=l^X=l,Y=l^Z=l,Y=l 


which I call causal form, where 

„Y|X=1,Z=1 ^ _ I _ 

^ 1-F^Y=1^X=1,Y=1^Z=1,Y=1 

can be seen as a normalization factor. This can be proved recalling that the sum of conditional 
probabilities P(Y = 1|X = 1, Z = 1) and P(Y = 0|X = 1, Z = 1) is equal to 1. If I write the 
probabilities in causal form I have 

r P(y = 1|X = 1,Z = 1) = ^X=1,Y=1^Z=1,Y=1 

I P(y = 0|X = 1,Z = 1) = 7^|x=hz=i^x=o^x=i,Y=o^z=i,Y=o ^ ^y|x=i,z=i 

where, in this case, 1 do not assume particular values for fjX|x=x,z=z conditional probabili¬ 
ties is equal to Recalling that this sum must be equal to 1,1 

obtain that ;£X|x=i,z=i |g gq^^j j-q ^ ^y=i^x=i,y=i^z=i,y=i^-i jg exactly the value which I 

obtain rewriting the conditional probability in causal form. For this reason, ;£X|x=a;,z=z g^j.^ £,g ggg^^ ^g 
a normalization factor. The conditional probability P(Z = zjX = x) is calculated using the table XZ. 
Then 1 write the marginal probability of X and the conditional probabilities of Z and Y in causal form: 




7 rZ=z|X=r ^ ^Z|X=x^Z=z^X=x,Z=z 


( 1 ) 

( 2 ) 


^Y=y|X=x,Z=z ^ ^Y|X=x,Z=z^Y=y^X=x,Y=y^Z=z,Y=y 


( 3 ) 


where for example the ratio between the causal one-effect parameter and the no causal one-effect 
parameter is ;£X|x=o,z=Oyj^Y|x=o,z=i £j^g between the causal two-effects parameter 
and the no causal two-effects parameter is 

^Y|X=1,Z=0^Y|X=0,Z=1 

^Y|X=0,Z=0^Y|X=1,Z=1 

The causal normalization factors ^X|x=x,z=z are calculated so: 
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Table 3: Marginal table XY given Z = 0 Table 4: Marginal table XY 
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Odds ratio and causal loglinear model 


In the loglinear model, the causal effects are considered in partial way and for this reason, a true causal 
analysis is not made. If I consider the causal model of Figure 1, Bergsma et al. (2009) calculate the 
total effect by the marginal table XY (table 4) and the direct effect by the 2 marginal tables XY given 
Z = z (tables 2 and 3) using the odds ratio. The odds ratio describes the relationship among binary 
variables; if the variables are categorical, it is necessary a transformation in binary variables to use 
them. For example if I want analyze the relation between X and Y, which are categorical variables 
with 5 categories, I transform them in binary variables: the transformed X and Y are equal to 1 if their 
original value is 5, 0 otherwise. The relationships considered by the odds ratio can be associative 
or causal (Zhang (2008)) in the first type the relation is measured using the actual response variable, 
while in the second using the potential response. If the two types of odds ratio are different, this is 
due to the influence of a third variable called confounding variable (Zhang (2008); Szumilas (2010)). 
This confounding variable is causally linked to the response variable but it is not related causally to 
other cause or it is linked causally but it is not a mediator variable (Szumilas (2010)) for example if X 
and Z influence Y, and X and Z are correlated (link which is not of causal type), Z is a confoundering 
variable of the relation between X and Y. Then in a simple mediation model without confounders, the 
total effect (TE) and the direct effect used in the loglinear literature (LDE) are given by the following 
formulas: 


OrTE = P(Y|X = xO l-F(Y|X^x) 
x,x' i-P(Y\X = x') P(Y|X = x) 

ORLDEfy. ^ P{Y\X = x',Z=z) l-PiY\X = x,Z = z) 
^ l-P(Y\X = x',Z = z) P{Y\X = x,Z = z) 


( 4 ) 

( 5 ) 


where the subscript x, x' indicates that the odds ratio measures the effect of the variation of X from x 
to x'. I note that they coincide with the definitions of total effect and controlled direct effect proposed 
by Pearl (2001, 2009,2012). I remember however that Pearl never uses the odds ratio to calculate the 
effects, but prefers to calculate them using the conditional moments. For this reason, I propose a causal 
analysis for the loglinear models, applying Pearl's theory to the odds ratio. Using the dummy code, 
the total effect is equal to 


^Y=l,X^ 
direct effect 


^Y|X=0,Z=0 _|_ ^Z=1^Y|X=0,Z=1 


^Y|X=1,Z=0 + ^pi^^X=l,Z=l^Y|x=l,Z=l 




Y|X=0,Z=0 I ,,Z=1,,Z=1,Y=1„Y|X=0,Z=1 


n -1' 


'7 


yY|X=l,Z=0 + ^pi^X=l,Z=l^^z=l,Y=l^Y|X=l,Z=l 


and the direct effect used in the loglinear literature is equal always to the causal two-effects parameter 
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i.e. it is independent of the value of fhe variable Z. If in a linear-in-parameters model 
without interaction the variable X and the variable Z influence Y but X does not influence Z, the total 
effect of X on Y is equal to the direct effect of X on Y. This is nof frue in a loglinear model without 
interaction: 1 find, indeed, that when ^(X=i,z=i,Y=i _ total effect is not equal to the direct effect, 
but there is another effect, which 1 call cell effect. The cell effect is present only if more variables 
influence fhe same variable, as in this case where X and Z influence Y. The cell effecf formula is: 


Cell 


effecf 

x,x' 


(Z) 


■ Y:,^P{Y\X = x',Z = z)P{Z\X = x) 

.1 - Ez P{y\X = x',Z = z)P(Z\X = x) 

1 - Ez P{Y\X = x,Z = z)P{Z\X = x)' 

X^^P(y|X = x,Z = z)P(Z|X = x) 

■ P(Y|X = x',Z = z) 1-P(Y|X = y,Z = 
_l-P(Y|X = x',Z = z) P(y|X = x,Z = z) 



( 6 ) 


It is not linked to the interaction calculated in additive form. The additive interaction in a loglinear 
model it is obtain by this formula: 

^Y=l|X=l,Z=l _ ^Y=1|X=0,Z=1 _ ^Y=1|X=1,Z=0 ^Y=1|X=0,Z=0 

In a loglinear model without the multiplicative interaction with dummy code, the additive inter¬ 
action effect is linked to linearity (appendix A) and is equal to 0 in these 3 cases: in the first case if 
the two-effects parameter between Y ad X is equal to 1 (i.e. = 1), in the second case if the 

two-effects parameter between Y and Z is equal to 1 (i.e. = 1) and in the third case if the 

two-effects parameter between Y and Z is equal to Of course when there is 

the multiplicative interaction, the additive interaction exists. 

In a loglinear model with dummy code and without multiplicative interaction, the cell effect is 
equal to 


relieffect 

*=0A=1 „Y|X=0,Z=0 „Y|X=0,Z=i„Z=i,,Z=l,Y=l 

V r/ I Pc p 

^Y|X=1,Z=0 _|_ j^Y|X=l,Z=l^Z=l^^Z=l,Y=l '■ ’ 

^Y|X=1,Z=0 + ^Y|X=l,Z=l^pi 

Of course, if the parameter is equal to 1 or is equal to 1, the cell effect becomes 

equal to 1 and the total effect is equal to the direct effect of X on Y or of Z on Y. In this case the cell effect 

depends on Z|X and then I can write Cellf^^®‘'^(Z) = Cellf^^®*'*', i.e. the cell effect can be interpreted 
as a constant interaction effect (this is not true in a loglinear model with multiplicative interaction). As 
seen in the introduction, indeed, the interaction effect can cause that the direct effect of one variable 
on another is a function of a third variable, and therefore varies as the third variable varies, while in 
this case the cell effect remains constant as the third variable varies. 

Because the total effect and the direct effect used in the loglinear literature are the odds ratio 
versions of the total effect and the controlled direct effect proposed by Pearl (2001, 2009, 2012), then I 
propose the odds ratio version of his indirect effect: 


Or/e EzP(y|X^x,Z = z)P(Z|X = xO 
i-x;zf’(y|x = x,z = z)p(z|x = x') 
1 - Ez P{Y\X = x,Z = z)P(Z|X = x) 
Ez P(Y|X = x,Z = z)P(Z|X = x) 


Then the total effect is equal to 


ORZ- = 


(z)Cellf5“* 




( 9 ) 


The direct effect used in the loglinear literature and the cell effect form the odds ratio version of Pearl's 
natural direct effect. Pearl, indeed, proposes 2 direct effects: the natural direct effect and the controlled 
direct effect. The first is the change of Y when X changes and Z is constant at whatever value obtained 
by the start value of X, while the second is the change of Y when X changes and all other factors are 
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held fixed. The natural direct effect is; 


LzPi"^\X = x',Z = z)PiZ\X = x) l-P{Y\X = x) (10) 

“ 1 - Ez P{y\X = x',Z = z)P{Z\X = x) P{Y\X = x) 

The natural direct effect depends on Z | X for Pearl's definition 

The interpretation of the effects calculated as odds ratio is the following: a value of the effect 
bigger than 1 means that the 2 variables change in the same direction (if X increases, Y increases) and 
a value of the effect smaller than 1 means that the 2 variables change in the different direction (if X 
increases, Y decreases). 

If I want calculate the effects of the variation of X from x' a x, I obfain 


orZ 




p^LDE _ ^ 




x,x' 

1 




^^^x',x ' 




1 

Cellflfecl 


Now I consider fhe relation among the effects and the parameters. In literature, the causal two- 
effects parameters determine the presence or absence of fhe direct link 

between the variables: for example if I suppose that (recalling that in this case ' = 

is equal to 1, then there is not a direct effect of X on Y. In ferms of path diagram, the arrow 
which goes from X to Y is not present. If I set the causal two-effects parameter equal to 1,1 

eliminate the direct effect of X on Z, while if I sef the no causal two-effects parameter equal 

to 1,1 don't eliminate the direct effect of X on Z, this because only the causal parameters can determine 
the presence or absence of the direct link. This can be shown using a simple example. I consider 
the following no causal paramefers: = 0.02, = 0.01, ^^^^=0.2, = 1, 

= 2 and = 1.5. The no causal paramefer is equal to 1, i.e. there is not a effect 

between Z and X. If I calculate the indirect effect, I find that is equal to 0.8894, i.e. an effect 
mediated by Z exists. This occurs because is equal to 1.1929 and then the variable X is still 

linked causally to Z, also if the no causal parameter is equal to 1. The total effect OR^^ is equal to 
OrLDE because the cell effect is equal to the inverse of the indirect effect OR^^ which measures the 
inverse change of X (from x' to x). 

Now I consider a new loglinear model where the values of paramefers , }i^ and 

remain equal to those of the previous example and the value of becomes 0.8383. In fhis case 

OR^^ is equal fo 1 because the causal parameter is equal to 1. In conclusion, if or 

He ' is equal to 1, the total effect OR^ ^ is equal to the direct effect used in the loglinear literature 
OrEDE pj. j-be natural direct effect, but in the first case there is the indirect effect, while in the second 
case, it disappears. When there is not the indirect effect, the variable X influences Y only directly. 

Now I consider a causal loglinear model with the multiplicative interaction. Then Y is influenced 
directly by the variable X, by variable Z and by their joint effect due to the three-interaction term. 
Using the definition of multiplicative interaction, the direct effect of X on Y used in fhe loglinear 
liferature becomes a function of Z. I show this recalling that the formulas (4), (5), (8) and (10) remain 
valid and applying the formula (5) to a causal loglinear model with dummy code. The direct effect 
used in the loglinear literature becomes: 


ni?LDE _ ,,X=l,Y=l,,X=l,Z=z,Y=l 


For the same reason, also the cell effect becomes a fimction of Z: 
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« i 1 „y|x=o,z=o_|_,,r|x=o,z=i„z=i 

effect ^ __ +'? _ 

^X=l,Z=z,Y=l ^Y|X=0,Z=0 _|_ ^Y|X=0,Z=1^Z=1^Z=1,Y=1 

„Y|X=1,Z=0 I „Y|X=1,Z=1,,X=1,Z=1,Y=1,,Z=1,,Z=1,Y=1 

'/ '/ Pf- P 


^Y|X=1,Z=0 + yjY\X=l,Z=l ^^Z=1 


The natural direct, indirect and total effects, instead, do not become function of Z. The indirect effect 
of a model with multiplicative interaction remains equal to that of a model without multiplicative 
interaction. 


efflog package 

Estimation procedure 

In the first section, I have presented two formulations of the same model: they are founded on two 
different assumptions (causal model and no causal model) and are estimated with two different 
approaches. In a loglinear model without the multiplicative interaction the parameters of the additive 
form can be estimated so: 

#Loglinear model: 

fit.glm<-glm(count~.*2, data=table, family=poisson) 

# where table is the frequency of the variables X,Z and Y 

while in a causal loglinear model without the multiplicative interaction, I use the package efflog Gheno 
(2015) to estimate the parameters of the additive form 

# Causal loglinear model: 

\library(efflog) 

Cloglin(table) 

# where table is the frequency of the variables X,Z and Y 

Of course to obtain the parameters of the multiplicative form, it is sufficiently to make this transforma¬ 
tion /r = exp{log{n)). hr efflog there is the command 

exp_par(table) 

which calculates the causal parameters in multiplicative form. The parameters of the causal form (i.e 
those with subscript c) are estimated by the causal loglinear model, the parameters without subscript 
are estimated by the traditional loglinear model. Only the parameters of conditional probability 
j^Y=y\x=x,z=z equal in both forms and for this reason I do not use never the subscript c for 

them. 

Now I consider that the three-interaction ^x=i,z=i,Y=i jg different from 1. Then the path diagram 
of the only direct effects on Y is shown in Figure 2. The introduction of the three-interaction parameter 
produces a multiplicative interaction effect on Y. If I consider the marginal probability of X and the 
conditional probabilities of Z and of Y, the introduction of the interaction term modifies only the 
formula (3): now the three-interaction term is added to the conditional probability of Y given X and Z 
so the model becomes: 


nZ=AX=x ^ ^Z\X=x^z=z^X=x,Z=z 
^Y=y\X=x,Z=z ^ ^Y\X=x,Z=z^Y=y^X=x,Y=y^Z=z,Y=y^X=x,Z=z,Y=y 

Then in a loglinear model with the multiplicative interaction the parameters of the additive form can 
be estimated so: 

#Loglinear model: 

fit.glm<-glm(count~.*3, data=table, family=poisson) 

# where table is the frequency of the variables X,Z and Y 

while in a causal loglinear model with the multiplicative interaction, I use the package efflog Gheno 
(2015) 
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# Causal loglinear model: 

\library(efflog) 

Cloglin_mult(table) 

# where table is the frequency of the variables X,Z and Y 

Of course to obtain the parameters of the multiplicative form, it is sufficiently to make this transforma¬ 
tion }i = exp(log{}i)). In efflog there is the command 

exp_par_mult(table) 

which calculates the causal parameters in multiplicative form. 

Causal effects 

The package efflog Gheno (2015) provides fimctions to calculate directly these effects. The commands 
for the effects of a loglinear model without multiplicative interaction are: 

cell_effect_or(x,y,z,w) 
ndirect_effect_or(x,y,z,w,t) 
indirect_effect_or(x,y,z,w,t) 
total_effect_or(x,y,z,w,t) 

where x = , y = z = w = t = 

The commands for calculating the effects of a loglinear model with multiplicative interaction are: 

cell_effect_mult_or(x,y,z,w,q) 
ndirect_effect_mult_or(x,y,z,w,t,q) 
indirect_effect_or(x,y,z,w,t) 
total_effect_mult_or(x,y,z,w,t,q) 

where X = p^^y,y = p^^^’^^y',z = p^^^’^^y,w = pf^^,t = = ji^=^'Z-=^,y'=y^ 

Empirical examples 

In this section, I apply my causal theory and the package efflog to empirical results. They consider 
the relations between a typical product (in this case the Sauris' ham) and its festival. This analysis is 
developed in marketing but it can be applied in many economic fields or in social sciences. 

Example 1 

The first dataset is composed of 3 dichotomous variables (X measures the interest about Sauris' ham 
considering the possibility of buying Sauris' ham, Z measures the satisfaction about Sauris' festival 
considering the happiness which an individual has if he thinks about Sauris' festival and Y measures 
the future behavior considering if an individual will buy Sauris's ham more often). The results of the 
causal loglinear model are shown in table 5. The two-effects parameters are all significant (i.e. all are 
different from 1). According to the traditional loglinear literature, the causal two-effects parameters 
are the direct effect. In this case, because all causal two-effects parameters are greater than 1, then an 
increase of variable X produces an increase of variable Z and the same result occurs for the relation 
between X and Y and for that between Z and Y. Now I calculate the effects using the formulas (4), (6), 
(8) and (10). The total effect is equal to 2.4008, then an increase of X produces an increase of Y, the 
natural direct effect is equal to 1.8741, then an increase of X produces an increase of Y. The indirect 
effect is equal to 1.2845: an increase of X produces , indirectly, an increase of Y. The cell effect is 0.9741: 
it mitigates the controlled direct effect. The presence of 2 variables which influence Y causes the cell 
effect and then the natural direct effect becomes 1.8741. Now I control if the additive interaction effect 
is equal to 0, i.e. if I apply a z-test (appendix B) and I find that z 

is equal to 0.4174 (the p-value is 0.6764) then I accept the hypothesis that the additive interaction effect 
is equal to 0. This is an example where the additive interaction is equal to 0 and cell effect is different 
from 1. 

From this analysis, I conclude that if a customer becomes interested in Sauris' ham, then he will 
buy Sauris' ham more often also thanks to the happiness due to Sauris' festival. In marketing research, 
this means that a event linked to the product can increase its sell. However, the role of this event is 
minus important than the interest about the product (indirect effect/ total effect < direct effect/total 
effect) and their joint effect decreases the direct effect used in the loglinear literature (cell effect < 1). 
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parameter 

value 


1.9240“^“^ 


2.4038“^“^“^ 


0.4881’^’^’^ 


3.3059“^“^“^ 


0.4659“^“^“^ 

vf 

1.7132“^“^“^ 


Table 5: First dataset 


parameter 

value 

^XZY 

2.8826’^ 


1.4042 


3.5385’^’^ 


0.2826“^“^“^ 


3.5534“^“^“^ 

Fc" 

0.3390*** 

F? 

1.2278. 


Table 6: Second dataset 


Signif. codes: 0 "***" 0.001 0.01 0.05 0.1 "" 1 


Example 2 

Now I consider a second dataset. This dataset is composed of 3 dichotomous variables (X measures 
the interest about Sauris' ham considering the possibility of festing Sauris' ham, Z measures fhe 
satisfaction about Sauris' festival considering the quality of products presented during the Sauris' 
festival and Y measures the future behavior considering if an individual will suggest others to go to 
Sauris' festival). The values of parameters are shown in table 6. The total effect is equal to 3.1886, 
then an increase of X produces an increase of Y. The natural direct effect is equal to 1.7286, then an 
increase of X produces an increase of Y. The indirecf effecf is equal to 1.4493: an increase of X produces, 
indirectly, an increase of Y. The cell effect, i.e. the effect of the presence of 2 variables which influence 
future behavior, is 0.4270 with Z=l, i.e. it mitigates the LD effect, while the cell effect with Z=0 is 
1.231002, i.e. it increases the LD effect. Now I consider the effect of the multiplicative interaction, 
whose parameter is bigger than 1. When Z is high (Z=l), the joint effect of satisfaction and interest 
(multiplicative interaction effect) increases the positive LD effect, while when it is low (Z=0), it leaves 
intact the LD effect. For any value of satisfaction, then, the overall interaction effect (cell effect + 
multiplicative interaction effect) is positive because is equal to 1.4 and the natural direct 

effect is always equal to 1.7286. 

From this analysis, I conclude that if a customer becomes interested in Sauris' ham, then he will 
suggest to go Sauris' festival more often thanks also to the quality of fhe presented products and to the 
overall joint effect of interesf and of satisfaction. 


Summary 


When a researcher analyzes the data, he is interested in understanding the mechanisms which govern 
the changes of the variables. To understand these mechanisms he uses the causal effects. Unfortunately, 
when the researcher uses the loglinear models to study the data, he has not available a causal theory, 
but only few comments on various papers where the odds ratios are used. For this reason, using the 
causal concepts provided by Pearl (2001, 2009, 2012), I provide a r-package efflog (Gheno (2015)) to 
calculate the effects in the loglinear models using odds ratios so that the parameters have the same 
interpretation given by the loglinear literature. Making so I find a new effecf which I call cell effecf. It 
can be interpreted as an interaction effect which occurs whenever I consider two variables affecting a 
third. The interaction effects in a causal loglinear model are three: multiplicative interaction effect, 
additive interaction effect and cell effect. Then the researcher, who studies his data with the causal 
theory proposed in this paper and using the r-package efflog will have the traditional effects (direct, 
indirect and total) plus a new interaction effect. 
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Appendix A: Additive interaction: a measure of the linearity 


I analyze the relation among the causal log-linear model and the linearity. As seen in the section 
2 , the causal log-linear model analyzes the relation among the variables using the cell frequencies, 
then the causal relation can be analyzed only using the conditional probabilities, but the relation 
among the variables can be expressed by any function, for example Y = /(X). A simple linear model 
requires that the causal relation among the variables is linear, i.e Y = JJq 'y^X. Of course the world 
is not perfect: it is necessary to introduce an error term, then the relation between Y and X becomes 
Y = fo jiX f,. In its simpler formulation, the linear model considers the variables X, Y and ^ 
continuous and normally distributed. Now I analyze what occurs in a log-linear model if the relation 
between X and Y is linear. In a first step, I consider a perfect word, i.e. where Y is perfectly given by the 
relation /Iq + 7 iX and X and Y are continuous variables with a generic joint distribution P(X,Y). To 
analyze the same variables with a causal log-linear model, I must discretize the continuous variables. 
Now 1 transform X and Y in two binary variables X* and Y* so: the the values of X (or Y) which are 
smaller than the mean become 0, the values of X (or Y) which are bigger than the mean become 1. This 
particular transformation is made in order that the linearity is inserted in causal log-linear model. The 


marginal probabilities of the new variables X* and Y* are: 



P(X*) = <j 

f P(X<£(X)) 

[ P(X>£(X)) 

X* = 0 

X* = 1 

(11) 

p(y*) = < 

V Al 

* * 

II II 

0 

(12) 


Now, using the linear relation between X and Y, I obtain that: 

r < E(Y) => fo + TlX <fo + 7 i£(X) 


1 simplify and obtain that 


y < £(y) is equal to 71 X < 7 iE(X) 


i.e 


P(y* = 0) = P(y < E(Y)) = 


P(X < £(X)) = P(X* =0) if 7 i > 0 
P(X > £(X)) if 7 i < 0 


(13) 



11 




W 





1 

0 


T 

1 

1 

0 

1 


0 

0 

0 

0 



1 

0 

1 




w 

1 0 


T 

1 

0 

Til Tio 

Toi Too 

Tl+ 

To+ 



T+1 T+o 

1 


Table 7: The joint probability with > 
0, without error term 


Table 8: The joint probability with 71 > 
0 and error term 


I consider the variables T and W, which are so built: 

if p{Y* = 1\X* = 1) 
if p(Y* =0|X*=1) 

if p(Y* = 0|X* = 0) 
if p(Y* = ijx* = 0) 

If 7i is positive, the joint distribution of T and W is showed in table 7: without error, the probability 
of y* equal to 1 given X* equal to 1 is 1, i.e. it is the certain event. The event "Y* equal to 0 given X* 
equal to 0" is the certain event. 

Because X is a continuous variable, the sign of equality in the inequality is not important, then the 
formula (13) can be written so: 


To analyze the relation between X* and Y*, 



p{Y* = 0) = p(y < E(y)) 


P(X < £(X)) = P(X* =0) if 7 i > 0 
P(X > E{X)) = P{X* = 1) if 7i<0 


(14) 


Unfortunately, the world is not perfect and the relation between X and Y contains an error term, which 
has zero mean. Then I obtain: 


Y < £(Y) => /3o + 7iX + C < /3o + 7i£(X) 


I simplify and obtain that 

y < £(Y) = 7 i£(X) is equal to 7i[X - £(X)] < 

With error term and 71 bigger than 0, the joint probability of variables T and W is showed in table 
8: there is not the certain event as the case without error term because the presence of the error term 
produces the existence of discordant events (i.e. "Y* equal to 0 given X* equal to 1" or "Y* equal to 
1 given X* equal to 0"). I follow the Tutz's method (Tutz (2011)) for the repeated measurements for 
binary variables. The repeated measurements occur when the researcher measures the same variables 
at different time or imder different conditions. To analyze if the distribution changes over times 
or conditions, he considers the joint distribuhon of the repeated measurements and controls if the 
marginal homogeneity holds. The marginal homogeneity can be seen in the table 7: it holds if is 
equal to 7 Ti+. In the perfect world, the marginal homogeneity calculated for the joint distribution of 
the binary variables T and W holds: tt+i = 1 = ni_^_ = 1. In the imperfect world the homogeneity 
holds iff TT^i is equal to 7 Ti+. I consider the log linear model showed in table 9, where I use the dummy 
code. Then the marginal homogeneity condition becomes: 


p(y* = i|x* = 1) 


^Y*=l^X*=l^Y*=l,X-=l 
:X*=1[1 + ^Y*=VT*=1-X-=1] 


P(Y* = 0|x* = 0) 


1 


i.e. the two "not" causal parameter jg gquai to reciprocal of the squared "not" causal 

parameter (i.e. 

Now I consider a mediation linear model in a perfect world, where X influences linearly Y and Z, 
which influences in turn linearly Y. This model is so: 


Z — ciQ oi-^X 
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z* 

y* 

^X*=x*,Z*=z*,Y*=y* 

0 

0 

0 

n 

0 

0 

1 


0 

1 

0 


0 

1 

1 


1 

0 

0 


1 

0 

1 

:7^x*=i^y=i^Y-=i,x-=i 

1 

1 

0 

:7^x*=i^z*=iy,x-=i,z*=i 

1 

1 

1 

:7^x-=i^r=i^z*=i^x-=i,y=i 
„X* =1,Z* =1 „ Y* =1,Z=1 ,,X* =1,Y* =1,Z* =1 

jijA.fl 


X* 

y* 

ji 

A 

JI 

0 

0 

n 

0 

1 


1 

0 


1 

1 

:?^''*=VX*=i^Y*=i,x-=i 


Table 9: The joint probability of simple Table 10: The joint probability of mediation linear 
linear model model 


Y — + CV2X 


This model can be rewritten in reduced form, i.e: 


Y — (tUg + *o) + (<^1 + 


which is equal to the relation between X and Y analyzed until now. Now I transform X, Z and Y in 
binary variables X*, Z* and Y* ( 0 if the value of variable is smaller than its mean, 1 if the value of 
variable is bigger than its mean). As in the simple linear model if oli is positive and there is not error 
term , the probability P(Z* = 0) is equal to probability P(X* = 0). Now If coi , oli and UJ 2 are positive, 
also the probability P(y* = 0) is equal to probability P(X* = 0), because in this case in reduce form 
7i is positive. Then the variables W and T becomes: 


r 1 if P(Y* = 1|X* = 1,Z* = 1) 
I 0 if P(y* = 0|X* = 1;Z* = 1) 

r 1 if P(y* = 0|X* = 0,Z* = 0) 

I 0 if p(y* = i|x* = o,z* = 0) 


In this case, the conditional probabilities Y* given Z* and X* are all equal to 0 in a perfect world 
when y* = X* = Z* = 1 and Y* = X* = Z* = 0 do not occur. The log-linear model is showed in table 
10. If I introduce the error terms and I use Tutz's method, of course with multiplicative interaction 
parameter ^''^=iA=i,z=i j-q f ^ tPg marginal homogeneity condition becomes: 


l + li^’=^li'^’=Yz>=i^Y>=i,x>=i - i + ^r-=i 


i.e. the "two" not causal parameter }i 
^Y*=i,z*=i reciprocal of the squared "not" causal parameter (i.e = l/{ [p 


^ is equal to product between the two not causal parameter 
7=i(i.ef/^*=i'X*=i 

^Y*=i,z*=ij This condition doesn't imply the condition P(y* = 1|X* = 1) = P(y* = 0|X* = 0): 
indeed the error term in the relation between the variable X* and Z* causes the inequality P(y* = 
1|X* = 1) ^ P(y* = 0|X* = 0). Then I must consider also the relation between the variable X* and 
Z*. The marginal homogeneity condition holds iffwhere c defines fhat the 
parameters are those of a causal log-linear model. As seen in section 2, the causal parameters can be 
always transformed in not causal log-linear parameters. Then the mediation linear model implies that: 


P 


Y*=1,X*=1 _ 


Z*=1,X*=1 


(a" 


(15) 


If these two conditions are satisfied, the equivalence P(y* = 1|X* = 1) = P(y* = 0|X* = 0) 
is true. Then I conclude that if I suppose that the variables X,Z and Y are linearly linked, then the 
relative parameters of the causal model must satisfy the bonds (15). This is important because the first 
bond of (15) causes the nullity of the additive interaction effect in a causal loglinear model without 
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multiplicative interaction effect. 


Appendix B: Test on the presence of the additive interaction term in a 
loglinear model without interaction 


In this section I find a test to analyze the presence of the cell effect obtained by Pearl's causal formula. 
For simplicity, 1 consider the parameters of the additive form. Agresti (2002) shows that the no causal 
loglinear model without two-effects parameter for a 2x2 table (i.e. a contingency table for 2 variable, X 
and Y) can be so written: 


log(w) 


log 

logmX=i'^=o 
. logmX=iA=i 



'10 0' 
10 1 
110 


log()/) 

log(p^) 


1 1 1 


. log(?^^) . 


DA 


(16) 


where m denotes the column vector of the expected counts of the contingency table and A is the vector 
of the additive no causal log-linear parameters. The formula (16) can be extent to a loglinear model 
with all interactions for a nxn contigency table. Then, in a general no causal log-linear model, using 
the maximum likelihood method, the variance-covariance matrix for the estimated additive no causal 
parameters is 


Coi;(log(/})) = Cov(k) = [D'diag{m)D] ^ (17) 

Now I consider the particular case where the additive interaction is equal to 0 also if 2 variables 
influence the variable Y . This occurs if Because these parameters 

remain equal both in the causal loglinear model and in the loglinear model, this relation can be tested 
both in the causal loglinear model and in the loglinear model. I test this relation in the loglinear model. 
For simplicity, 1 consider the additive parametrization, then the relation becomes: 


log(^2=i'^=i) +21og(?i^=i) + log(?.^=i'^=i) = 

aZ=it=i+2A^=i+A^=1'^=i=0 


(18) 


Now 1 propose a z-test. Because the vector of the estimated lambda are distributed as a multivariate 
normal, the left-side (18) is a variable normally distributed with the mean equal to 


E(aZ=it=i +2A^=i + A^=i'^=i) = 

AZ=hy=i + 2A^=i + A^=i'^=i = /3, 

and the variance equal to 

Var{^) =yflr(A^=i'^=i) 

-h4yflr(A^=^) -F Var{X^=^'^=^) + 4Cov{X^=^'^=^ 

+ 2Cor;(A2=i'^=i,A^=i'^=i) 

+ 4Coi;(A^=i'^=i,A^=i) 

Then the statistic z, which is equal to (^ — is normally distributed with mean equal to 

0 and variance equal to 1. Now I has a statistic to test when the cell effect is equal to 0. The equality 
(18) requires that /3 is equal to 0, then testing the equality condition is equal to testing that /3 is equal to 
0 . 









